Search | VHL Regional Portal

1.

An analysis of published study designs in PubMed prisoner health abstracts from 1963 to 2023: a text mining study.

Karystianis, George; Lukmanjaya, Wilson; Buchan, Iain; Simpson, Paul; Ginnivan, Natasha; Nenadic, Goran; Butler, Tony.

BMC Med Res Methodol ; 24(1): 68, 2024 Mar 17.

Article in English | MEDLINE | ID: mdl-38494501

ABSTRACT

BACKGROUND: The challenging nature of studies with incarcerated populations and other offender groups can impede the conduct of research, particularly that involving complex study designs such as randomised control trials and clinical interventions. Providing an overview of study designs employed in this area can offer insights into this issue and how research quality may impact on health and justice outcomes. METHODS: We used a rule-based approach to extract study designs from a sample of 34,481 PubMed abstracts related to epidemiological criminology published between 1963 and 2023. The results were compared against an accepted hierarchy of scientific evidence. RESULTS: We evaluated our method in a random sample of 100 PubMed abstracts. An F1-Score of 92.2% was returned. Of 34,481 study abstracts, almost 40.0% (13,671) had an extracted study design. The most common study design was observational (37.3%; 5101) while experimental research in the form of trials (randomised, non-randomised) was present in 16.9% (2319). Mapped against the current hierarchy of scientific evidence, 13.7% (1874) of extracted study designs could not be categorised. Among the remaining studies, most were observational (17.2%; 2343) followed by systematic reviews (10.5%; 1432) with randomised controlled trials accounting for 8.7% (1196) of studies and meta-analysis for 1.4% (190) of studies. CONCLUSIONS: It is possible to extract epidemiological study designs from a large-scale PubMed sample computationally. However, the number of trials, systematic reviews, and meta-analysis is relatively small - just 1 in 5 articles. Despite an increase over time in the total number of articles, study design details in the abstracts were missing. Epidemiological criminology still lacks the experimental evidence needed to address the health needs of the marginalized and isolated population that is prisoners and offenders.

Subject(s)

Criminals , Prisoners , Humans , Data Mining , Research Design

2.

Automatic Extraction of Research Themes in Epidemiological Criminology From PubMed Abstracts From 1946 to 2020: Text Mining Study.

Karystianis, George; Simpson, Paul; Lukmanjaya, Wilson; Ginnivan, Natasha; Nenadic, Goran; Buchan, Iain; Butler, Tony.

JMIR Form Res ; 7: e49721, 2023 Sep 22.

Article in English | MEDLINE | ID: mdl-37738080

ABSTRACT

BACKGROUND: The emerging field of epidemiological criminology studies the intersection between public health and justice systems. To increase the value of and reduce waste in research activities in this area, it is important to perform transparent research priority setting considering the needs of research beneficiaries and end users along with a systematic assessment of the existing research activities to address gaps and harness opportunities. OBJECTIVE: In this study, we aimed to examine published research outputs in epidemiological criminology to assess gaps between published outputs and current research priorities identified by prison stakeholders. METHODS: A rule-based method was applied to 23,904 PubMed epidemiological criminology abstracts to extract the study determinants and outcomes (ie, "themes"). These were mapped against the research priorities identified by Australian prison stakeholders to assess the differences from research outputs. The income level of the affiliation country of the first authors was also identified to compare the ranking of research priorities in countries categorized by income levels. RESULTS: On an evaluation set of 100 abstracts, the identification of themes returned an F1-score of 90%, indicating reliable performance. More than 53.3% (11,927/22,361) of the articles had at least 1 extracted theme; the most common was substance use (1533/11,814, 12.97%), followed by HIV (1493/11,814, 12.64%). The infectious disease category (2949/11,814, 24.96%) was the most common research priority category, followed by mental health (2840/11,814, 24.04%) and alcohol and other drug use (2433/11,814, 20.59%). A comparison between the extracted themes and the stakeholder priorities showed an alignment for mental health, infectious diseases, and alcohol and other drug use. Although behavior- and juvenile-related themes were common, they did not feature as prison priorities. Most studies were conducted in high-income countries (10,083/11,814, 85.35%), while countries with the lowest income status focused half of their research on infectious diseases (47/91, 52%). CONCLUSIONS: The identification of research themes from PubMed epidemiological criminology research abstracts is possible through the application of a rule-based text mining method. The frequency of the investigated themes may reflect historical developments concerning disease prevalence, treatment advances, and the social understanding of illness and incarcerated populations. The differences between income status groups are likely to be explained by local health priorities and immediate health risks. Notable gaps between stakeholder research priorities and research outputs concerned themes that were more focused on social factors and systems and may reflect publication bias or self-publication selection, highlighting the need for further research on prison health services and the social determinants of health. Different jurisdictions, countries, and regions should undertake similar systematic and transparent research priority-setting processes.

3.

An Analysis of PubMed Abstracts From 1946 to 2021 to Identify Organizational Affiliations in Epidemiological Criminology: Descriptive Study.

Karystianis, George; Lukmanjaya, Wilson; Simpson, Paul; Schofield, Peter; Ginnivan, Natasha; Nenadic, Goran; van Leeuwen, Marina; Buchan, Iain; Butler, Tony.

Interact J Med Res ; 11(2): e42891, 2022 Dec 05.

Article in English | MEDLINE | ID: mdl-36469411

ABSTRACT

BACKGROUND: Epidemiological criminology refers to health issues affecting incarcerated and nonincarcerated offender populations, a group recognized as being challenging to conduct research with. Notwithstanding this, an urgent need exists for new knowledge and interventions to improve heath, justice, and social outcomes for this marginalized population. OBJECTIVE: To better understand research outputs in the field of epidemiological criminology, we examined the lead author's affiliation by analyzing peer-reviewed published outputs to determine countries and organizations (eg, universities, governmental and nongovernmental organizations) responsible for peer-reviewed publications. METHODS: We used a semiautomated approach to examine the first-author affiliations of 23,904 PubMed epidemiological studies related to incarcerated and offender populations published in English between 1946 and 2021. We also mapped research outputs to the World Justice Project Rule of Law Index to better understand whether there was a relationship between research outputs and the overall standard of a country's justice system. RESULTS: Nordic countries (Sweden, Norway, Finland, and Denmark) had the highest research outputs proportional to their incarcerated population, followed by Australia. University-affiliated first authors comprised 73.3% of published articles, with the Karolinska Institute (Sweden) being the most published, followed by the University of New South Wales (Australia). Government-affiliated first authors were on 8.9% of published outputs, and prison-affiliated groups were on 1%. Countries with the lowest research outputs also had the lowest scores on the Rule of Law Index. CONCLUSIONS: This study provides important information on who is publishing research in the epidemiological criminology field. This has implications for promoting research diversity, independence, funding equity, and partnerships between universities and government departments that control access to incarcerated and offending populations.

4.

Mental Illness Concordance Between Hospital Clinical Records and Mentions in Domestic Violence Police Narratives: Data Linkage Study.

Karystianis, George; Cabral, Rina Carines; Adily, Armita; Lukmanjaya, Wilson; Schofield, Peter; Buchan, Iain; Nenadic, Goran; Butler, Tony.

JMIR Form Res ; 6(10): e39373, 2022 Oct 20.

Article in English | MEDLINE | ID: mdl-36264613

ABSTRACT

BACKGROUND: To better understand domestic violence, data sources from multiple sectors such as police, justice, health, and welfare are needed. Linking police data to data collections from other agencies could provide unique insights and promote an all-of-government response to domestic violence. The New South Wales Police Force attends domestic violence events and records information in the form of both structured data and a free-text narrative, with the latter shown to be a rich source of information on the mental health status of persons of interest (POIs) and victims, abuse types, and sustained injuries. OBJECTIVE: This study aims to examine the concordance (ie, matching) between mental illness mentions extracted from the police's event narratives and mental health diagnoses from hospital and emergency department records. METHODS: We applied a rule-based text mining method on 416,441 domestic violence police event narratives between December 2005 and January 2016 to identify mental illness mentions for POIs and victims. Using different window periods (1, 3, 6, and 12 months) before and after a domestic violence event, we linked the extracted mental illness mentions of victims and POIs to clinical records from the Emergency Department Data Collection and the Admitted Patient Data Collection in New South Wales, Australia using a unique identifier for each individual in the same cohort. RESULTS: Using a 2-year window period (ie, 12 months before and after the domestic violence event), less than 1% (3020/416,441, 0.73%) of events had a mental illness mention and also a corresponding hospital record. About 16% of domestic violence events for both POIs (382/2395, 15.95%) and victims (101/631, 16.01%) had an agreement between hospital records and police narrative mentions of mental illness. A total of 51,025/416,441 (12.25%) events for POIs and 14,802/416,441 (3.55%) events for victims had mental illness mentions in their narratives but no hospital record. Only 841 events for POIs and 919 events for victims had a documented hospital record within 48 hours of the domestic violence event. CONCLUSIONS: Our findings suggest that current surveillance systems used to report on domestic violence may be enhanced by accessing rich information (ie, mental illness) contained in police text narratives, made available for both POIs and victims through the application of text mining. Additional insights can be gained by linkage to other health and welfare data collections.

5.

Domestic Violence in Residential Care Facilities in New South Wales, Australia: A Text Mining Study.

Withall, Adrienne; Karystianis, George; Duncan, Dayna; Hwang, Ye In; Kidane, Amanuel Hagos; Butler, Tony.

Gerontologist ; 62(2): 223-231, 2022 Feb 09.

Article in English | MEDLINE | ID: mdl-34023902

ABSTRACT

BACKGROUND AND OBJECTIVES: The police are often the first to attend domestic violence events in New South Wales (NSW), Australia, recording related details as structured information (e.g., date of the event, type of incident, premises type) and text narratives which contain important information (e.g., mental health status, abuse types) for victims and perpetrators. This study examined the characteristics of victims and persons of interest (POIs) suspected and/or charged with perpetrating a domestic violence-related crime in residential care facilities. RESEARCH DESIGN AND METHODS: The study employed a text mining method that extracted key information from 700 police-recorded domestic violence events in NSW residential care facilities. RESULTS: Victims were mostly female (65.4%) and older adults (median age 80.3). POIs were predominantly male (67.0%) and were younger than the victims (median age 57.0). While low rates of mental illnesses were recorded (29.1% in victims; 17.4% in POIs), "dementia" was the most common condition among POIs (55.7%) and victims (73.0%). "Physical abuse" was the most common abuse type (80.2%) with "bruising" the most common injury (36.8%). The most common relationship between perpetrator and victim was "carer" (76.6%). DISCUSSION AND IMPLICATIONS: These findings highlight the opportunity provided by police text-based data to offer insights into elder abuse within residential care facilities.

Subject(s)

Crime Victims , Domestic Violence , Aged , Aged, 80 and over , Australia , Data Mining/methods , Female , Humans , Male , New South Wales/epidemiology , Police

6.

Nonfatal Strangulation During Domestic Violence Events in New South Wales: Prevalence and Characteristics Using Text Mining Study of Police Narratives.

Wilson, Mandy; Spike, Erin; Karystianis, George; Butler, Tony.

Violence Against Women ; 28(10): 2259-2285, 2022 08.

Article in English | MEDLINE | ID: mdl-34581646

ABSTRACT

Nonfatal strangulation (NFS) is a common form of domestic violence (DV) that frequently leaves no visible signs of injury and can be a portent for future fatality. A validated text mining approach was used to analyze a police dataset of 182,949 DV events for the presence of NFS. Results confirmed NFS within intimate partner relationships is a gendered form of violence. The presence of injury and/or other (non-NFS) forms of physical abuse, emotional/verbal/social abuse, and the perpetrator threatening to kill the victim, were associated with significantly higher odds of NFS perpetration. Police data contain rich information that can be accessed using automated methodologies such as text mining to add to our understanding of this pressing public health issue.

Subject(s)

Domestic Violence , Intimate Partner Violence , Data Mining/methods , Humans , New South Wales , Police , Prevalence

7.

Utilizing Text Mining, Data Linkage and Deep Learning in Police and Health Records to Predict Future Offenses in Family and Domestic Violence.

Karystianis, George; Cabral, Rina Carines; Han, Soyeon Caren; Poon, Josiah; Butler, Tony.

Front Digit Health ; 3: 602683, 2021.

Article in English | MEDLINE | ID: mdl-34713088

ABSTRACT

Family and Domestic violence (FDV) is a global problem with significant social, economic, and health consequences for victims including increased health care costs, mental trauma, and social stigmatization. In Australia, the estimated annual cost of FDV is $22 billion, with one woman being murdered by a current or former partner every week. Despite this, tools that can predict future FDV based on the features of the person of interest (POI) and victim are lacking. The New South Wales Police Force attends thousands of FDV events each year and records details as fixed fields (e.g., demographic information for individuals involved in the event) and as text narratives which describe abuse types, victim injuries, threats, including the mental health status for POIs and victims. This information within the narratives is mostly untapped for research and reporting purposes. After applying a text mining methodology to extract information from 492,393 FDV event narratives (abuse types, victim injuries, mental illness mentions), we linked these characteristics with the respective fixed fields and with actual mental health diagnoses obtained from the NSW Ministry of Health for the same cohort to form a comprehensive FDV dataset. These data were input into five deep learning models (MLP, LSTM, Bi-LSTM, Bi-GRU, BERT) to predict three FDV offense types ("hands-on," "hands-off," "Apprehended Domestic Violence Order (ADVO) breach"). The transformer model with BERT embeddings returned the best performance (69.00% accuracy; 66.76% ROC) for "ADVO breach" in a multilabel classification setup while the binary classification setup generated similar results. "Hands-off" offenses proved the hardest offense type to predict (60.72% accuracy; 57.86% ROC using BERT) but showed potential to improve with fine-tuning of binary classification setups. "Hands-on" offenses benefitted least from the contextual information gained through BERT embeddings in which MLP with categorical embeddings outperformed it in three out of four metrics (65.95% accuracy; 78.03% F1-score; 70.00% precision). The encouraging results indicate that future FDV offenses can be predicted using deep learning on a large corpus of police and health data. Incorporating additional data sources will likely increase the performance which can assist those working on FDV and law enforcement to improve outcomes and better manage FDV events.

8.

Surveillance of Domestic Violence Using Text Mining Outputs From Australian Police Records.

Karystianis, George; Adily, Armita; Schofield, Peter W; Wand, Handan; Lukmanjaya, Wilson; Buchan, Iain; Nenadic, Goran; Butler, Tony.

Front Psychiatry ; 12: 787792, 2021.

Article in English | MEDLINE | ID: mdl-35222105

ABSTRACT

In Australia, domestic violence reports are mostly based on data from the police, courts, hospitals, and ad hoc surveys. However, gaps exist in reporting information such as victim injuries, mental health status and abuse types. The police record details of domestic violence events as structured information (e.g., gender, postcode, ethnicity), but also in text narratives describing other details such as injuries, substance use, and mental health status. However, the voluminous nature of the narratives has prevented their use for surveillance purposes. We used a validated text mining methodology on 492,393 police-attended domestic violence event narratives from 2005 to 2016 to extract mental health mentions on persons of interest (POIs) (individuals suspected/charged with a domestic violence offense) and victims, abuse types, and victim injuries. A significant increase was observed in events that recorded an injury type (28.3% in 2005 to 35.6% in 2016). The pattern of injury and abuse types differed between male and female victims with male victims more likely to be punched and to experience cuts and bleeding and female victims more likely to be grabbed and pushed and have bruises. The four most common mental illnesses (alcohol abuse, bipolar disorder, depression schizophrenia) were the same in male and female POIs. An increase from 5.0% in 2005 to 24.3% in 2016 was observed in the proportion of events with a reported mental illness with an increase between 2005 and 2016 in depression among female victims. These findings demonstrate that extracting information from police narratives can provide novel insights into domestic violence patterns including confounding factors (e.g., mental illness) and thus enable policy responses to address this significant public health problem.

9.

Prevalence of Mental Illnesses in Domestic Violence Police Records: Text Mining Study.

Karystianis, George; Simpson, Annabeth; Adily, Armita; Schofield, Peter; Greenberg, David; Wand, Handan; Nenadic, Goran; Butler, Tony.

J Med Internet Res ; 22(12): e23725, 2020 12 24.

Article in English | MEDLINE | ID: mdl-33361056

ABSTRACT

BACKGROUND: The New South Wales Police Force (NSWPF) records details of significant numbers of domestic violence (DV) events they attend each year as both structured quantitative data and unstructured free text. Accessing information contained in the free text such as the victim's and persons of interest (POI's) mental health status could be useful in the better management of DV events attended by the police and thus improve health, justice, and social outcomes. OBJECTIVE: The aim of this study is to present the prevalence of extracted mental illness mentions for POIs and victims in police-recorded DV events. METHODS: We applied a knowledge-driven text mining method to recognize mental illness mentions for victims and POIs from police-recorded DV events. RESULTS: In 416,441 police-recorded DV events with single POIs and single victims, we identified 64,587 events (15.51%) with at least one mental illness mention versus 4295 (1.03%) recorded in the structured fixed fields. Two-thirds (67,582/85,880, 78.69%) of mental illnesses were associated with POIs versus 21.30% (18,298/85,880) with victims; depression was the most common condition in both victims (2822/12,589, 22.42%) and POIs (7496/39,269, 19.01%). Mental illnesses were most common among POIs aged 0-14 years (623/1612, 38.65%) and in victims aged over 65 years (1227/22,873, 5.36%). CONCLUSIONS: A wealth of mental illness information exists within police-recorded DV events that can be extracted using text mining. The results showed mood-related illnesses were the most common in both victims and POIs. Further investigation is required to determine the reliability of the mental illness mentions against sources of diagnostic information.

Subject(s)

Data Mining/methods , Domestic Violence/psychology , Mental Disorders/epidemiology , Police/ethics , Adolescent , Adult , Female , Humans , Male , Prevalence , Reproducibility of Results , Young Adult

10.

Correction: Automatic Extraction of Mental Health Disorders From Domestic Violence Police Narratives: Text Mining Study.

Karystianis, George; Adily, Armita; Schofield, Peter; Knight, Lee; Galdon, Clara; Greenberg, David; Jorm, Louisa; Nenadic, Goran; Butler, Tony.

J Med Internet Res ; 21(4): e13007, 2019 Apr 05.

Article in English | MEDLINE | ID: mdl-30951492

ABSTRACT

[This corrects the article DOI: 10.2196/11548.].

11.

Automated Analysis of Domestic Violence Police Reports to Explore Abuse Types and Victim Injuries: Text Mining Study.

Karystianis, George; Adily, Armita; Schofield, Peter W; Greenberg, David; Jorm, Louisa; Nenadic, Goran; Butler, Tony.

J Med Internet Res ; 21(3): e13067, 2019 03 12.

Article in English | MEDLINE | ID: mdl-30860490

ABSTRACT

BACKGROUND: The police attend numerous domestic violence events each year, recording details of these events as both structured (coded) data and unstructured free-text narratives. Abuse types (including physical, psychological, emotional, and financial) conducted by persons of interest (POIs) along with any injuries sustained by victims are typically recorded in long descriptive narratives. OBJECTIVE: We aimed to determine if an automated text mining method could identify abuse types and any injuries sustained by domestic violence victims in narratives contained in a large police dataset from the New South Wales Police Force. METHODS: We used a training set of 200 recorded domestic violence events to design a knowledge-driven approach based on syntactical patterns in the text and then applied this approach to a large set of police reports. RESULTS: Testing our approach on an evaluation set of 100 domestic violence events provided precision values of 90.2% and 85.0% for abuse type and victim injuries, respectively. In a set of 492,393 domestic violence reports, we found 71.32% (351,178) of events with mentions of the abuse type(s) and more than one-third (177,117 events; 35.97%) contained victim injuries. "Emotional/verbal abuse" (33.46%; 117,488) was the most common abuse type, followed by "punching" (86,322 events; 24.58%) and "property damage" (22.27%; 78,203 events). "Bruising" was the most common form of injury sustained (51,455 events; 29.03%), with "cut/abrasion" (28.93%; 51,284 events) and "red marks/signs" (23.71%; 42,038 events) ranking second and third, respectively. CONCLUSIONS: The results suggest that text mining can automatically extract information from police-recorded domestic violence events that can support further public health research into domestic violence, such as examining the relationship of abuse types with victim injuries and of gender and abuse types with risk escalation for victims of domestic violence. Potential also exists for this extracted information to be linked to information on the mental health status.

Subject(s)

Data Mining/methods , Domestic Violence/statistics & numerical data , Police/statistics & numerical data , Adult , Female , Humans , Male

12.

A rule-based approach to identify patient eligibility criteria for clinical trials from narrative longitudinal records.

Karystianis, George; Florez-Vargas, Oscar; Butler, Tony; Nenadic, Goran.

JAMIA Open ; 2(4): 521-527, 2019 Dec.

Article in English | MEDLINE | ID: mdl-32025649

ABSTRACT

OBJECTIVE: Achieving unbiased recognition of eligible patients for clinical trials from their narrative longitudinal clinical records can be time consuming. We describe and evaluate a knowledge-driven method that identifies whether a patient meets a selected set of 13 eligibility clinical trial criteria from their longitudinal clinical records, which was one of the tasks of the 2018 National NLP Clinical Challenges. MATERIALS AND METHODS: The approach developed uses rules combined with manually crafted dictionaries that characterize the domain. The rules are based on common syntactical patterns observed in text indicating or describing explicitly a criterion. Certain criteria were classified as "met" only when they occurred within a designated time period prior to the most recent narrative of a patient record and were dealt through their position in text. RESULTS: The system was applied to an evaluation set of 86 unseen clinical records and achieved a microaverage F1-score of 89.1% (with a micro F1-score of 87.0% and 91.2% for the patients that met and did not meet the criteria, respectively). Most criteria returned reliable results (drug abuse, 92.5%; Hba1c, 91.3%) while few (eg, advanced coronary artery disease, 72.0%; myocardial infarction within 6 months of the most recent narrative, 47.5%) proved challenging enough. CONCLUSION: Overall, the results are encouraging and indicate that automated text mining methods can be used to process clinical records to recognize whether a patient meets a set of clinical trial criteria and could be leveraged to reduce the workload of humans screening patients for trials.

13.

Automatic Extraction of Mental Health Disorders From Domestic Violence Police Narratives: Text Mining Study.

Karystianis, George; Adily, Armita; Schofield, Peter; Knight, Lee; Galdon, Clara; Greenberg, David; Jorm, Louisa; Nenadic, Goran; Butler, Tony.

J Med Internet Res ; 20(9): e11548, 2018 09 13.

Article in English | MEDLINE | ID: mdl-30213778

ABSTRACT

BACKGROUND: Vast numbers of domestic violence (DV) incidents are attended by the New South Wales Police Force each year in New South Wales and recorded as both structured quantitative data and unstructured free text in the WebCOPS (Web-based interface for the Computerised Operational Policing System) database regarding the details of the incident, the victim, and person of interest (POI). Although the structured data are used for reporting purposes, the free text remains untapped for DV reporting and surveillance purposes. OBJECTIVE: In this paper, we explore whether text mining can automatically identify mental health disorders from this unstructured text. METHODS: We used a training set of 200 DV recorded events to design a knowledge-driven approach based on lexical patterns in text suggesting mental health disorders for POIs and victims. RESULTS: The precision returned from an evaluation set of 100 DV events was 97.5% and 87.1% for mental health disorders related to POIs and victims, respectively. After applying our approach to a large-scale corpus of almost a half million DV events, we identified 77,995 events (15.83%) that mentioned mental health disorders, with 76.96% (60,032/77,995) of those linked to POIs versus 16.47% (12,852/77,995) for the victims and 6.55% (5111/77,995) for both. Depression was the most common mental health disorder mentioned in both victims (22.25%, 3269) and POIs (18.70%, 8944), followed by alcohol abuse for POIs (12.19%, 5829) and various anxiety disorders (eg, panic disorder, generalized anxiety disorder) for victims (11.66%, 1714). CONCLUSIONS: The results suggest that text mining can automatically extract targeted information from police-recorded DV events to support further public health research into the nexus between mental health disorders and DV.

Subject(s)

Data Mining/methods , Domestic Violence/psychology , Mental Health/standards , Adult , Female , Humans , Narration , Police

14.

Automated screening of research studies for systematic reviews using study characteristics.

Tsafnat, Guy; Glasziou, Paul; Karystianis, George; Coiera, Enrico.

Syst Rev ; 7(1): 64, 2018 04 25.

Article in English | MEDLINE | ID: mdl-29695296

ABSTRACT

BACKGROUND: Screening candidate studies for inclusion in a systematic review is time-consuming when conducted manually. Automation tools could reduce the human effort devoted to screening. Existing methods use supervised machine learning which train classifiers to identify relevant words in the abstracts of candidate articles that have previously been labelled by a human reviewer for inclusion or exclusion. Such classifiers typically reduce the number of abstracts requiring manual screening by about 50%. METHODS: We extracted four key characteristics of observational studies (population, exposure, confounders and outcomes) from the text of titles and abstracts for all articles retrieved using search strategies from systematic reviews. Our screening method excluded studies if they did not meet a predefined set of characteristics. The method was evaluated using three systematic reviews. Screening results were compared to the actual inclusion list of the reviews. RESULTS: The best screening threshold rule identified studies that mentioned both exposure (E) and outcome (O) in the study abstract. This screening rule excluded 93.7% of retrieved studies with a recall of 98%. CONCLUSIONS: Filtering studies for inclusion in a systematic review based on the detection of key study characteristics in abstracts significantly outperformed standard approaches to automated screening and appears worthy of further development and evaluation.

Subject(s)

Automation , Biomedical Research , Machine Learning , Systematic Reviews as Topic , Humans , Automation/methods

15.

Automatic mining of symptom severity from psychiatric evaluation notes.

Karystianis, George; Nevado, Alejo J; Kim, Chi-Hun; Dehghan, Azad; Keane, John A; Nenadic, Goran.

Int J Methods Psychiatr Res ; 27(1)2018 03.

Article in English | MEDLINE | ID: mdl-29271009

ABSTRACT

OBJECTIVES: As electronic mental health records become more widely available, several approaches have been suggested to automatically extract information from free-text narrative aiming to support epidemiological research and clinical decision-making. In this paper, we explore extraction of explicit mentions of symptom severity from initial psychiatric evaluation records. We use the data provided by the 2016 CEGS N-GRID NLP shared task Track 2, which contains 541 records manually annotated for symptom severity according to the Research Domain Criteria. METHODS: We designed and implemented 3 automatic methods: a knowledge-driven approach relying on local lexicalized rules based on common syntactic patterns in text suggesting positive valence symptoms; a machine learning method using a neural network; and a hybrid approach combining the first 2 methods with a neural network. RESULTS: The results on an unseen evaluation set of 216 psychiatric evaluation records showed a performance of 80.1% for the rule-based method, 73.3% for the machine-learning approach, and 72.0% for the hybrid one. CONCLUSIONS: Although more work is needed to improve the accuracy, the results are encouraging and indicate that automated text mining methods can be used to classify mental health symptom severity from free text psychiatric notes to support epidemiological and clinical research.

Subject(s)

Data Mining/methods , Electronic Health Records , Machine Learning , Mental Disorders/physiopathology , Severity of Illness Index , Adult , Humans , Mental Disorders/diagnosis , Neural Networks, Computer

16.

Learning to identify Protected Health Information by integrating knowledge- and data-driven algorithms: A case study on psychiatric evaluation notes.

Dehghan, Azad; Kovacevic, Aleksandar; Karystianis, George; Keane, John A; Nenadic, Goran.

J Biomed Inform ; 75S: S28-S33, 2017 Nov.

Article in English | MEDLINE | ID: mdl-28602908

ABSTRACT

De-identification of clinical narratives is one of the main obstacles to making healthcare free text available for research. In this paper we describe our experience in expanding and tailoring two existing tools as part of the 2016 CEGS N-GRID Shared Tasks Track 1, which evaluated de-identification methods on a set of psychiatric evaluation notes for up to 25 different types of Protected Health Information (PHI). The methods we used rely on machine learning on either a large or small feature space, with additional strategies, including two-pass tagging and multi-class models, which both proved to be beneficial. The results show that the integration of the proposed methods can identify Health Information Portability and Accountability Act (HIPAA) defined PHIs with overall F1-scores of â¼90% and above. Yet, some classes (Profession, Organization) proved again to be challenging given the variability of expressions used to reference given information.

Subject(s)

Algorithms , Confidentiality , Mental Disorders/psychology , Health Insurance Portability and Accountability Act , Humans , Machine Learning , United States

17.

Evaluation of a rule-based method for epidemiological document classification towards the automation of systematic reviews.

Karystianis, George; Thayer, Kristina; Wolfe, Mary; Tsafnat, Guy.

J Biomed Inform ; 70: 27-34, 2017 06.

Article in English | MEDLINE | ID: mdl-28455150

ABSTRACT

INTRODUCTION: Most data extraction efforts in epidemiology are focused on obtaining targeted information from clinical trials. In contrast, limited research has been conducted on the identification of information from observational studies, a major source for human evidence in many fields, including environmental health. The recognition of key epidemiological information (e.g., exposures) through text mining techniques can assist in the automation of systematic reviews and other evidence summaries. METHOD: We designed and applied a knowledge-driven, rule-based approach to identify targeted information (study design, participant population, exposure, outcome, confounding factors, and the country where the study was conducted) from abstracts of epidemiological studies included in several systematic reviews of environmental health exposures. The rules were based on common syntactical patterns observed in text and are thus not specific to any systematic review. To validate the general applicability of our approach, we compared the data extracted using our approach versus hand curation for 35 epidemiological study abstracts manually selected for inclusion in two systematic reviews. RESULTS: The returned F-score, precision, and recall ranged from 70% to 98%, 81% to 100%, and 54% to 97%, respectively. The highest precision was observed for exposure, outcome and population (100%) while recall was best for exposure and study design with 97% and 89%, respectively. The lowest recall was observed for the population (54%), which also had the lowest F-score (70%). CONCLUSION: The generated performance of our text-mining approach demonstrated encouraging results for the identification of targeted information from observational epidemiological study abstracts related to environmental exposures. We have demonstrated that rules based on generic syntactic patterns in one corpus can be applied to other observational study design by simple interchanging the dictionaries aiming to identify certain characteristics (i.e., outcomes, exposures). At the document level, the recognised information can assist in the selection and categorization of studies included in a systematic review.

Subject(s)

Automation , Data Mining , Review Literature as Topic

18.

Increasing efficiency of preclinical research by group sequential designs.

Neumann, Konrad; Grittner, Ulrike; Piper, Sophie K; Rex, Andre; Florez-Vargas, Oscar; Karystianis, George; Schneider, Alice; Wellwood, Ian; Siegerink, Bob; Ioannidis, John P A; Kimmelman, Jonathan; Dirnagl, Ulrich.

PLoS Biol ; 15(3): e2001307, 2017 03.

Article in English | MEDLINE | ID: mdl-28282371

ABSTRACT

Despite the potential benefits of sequential designs, studies evaluating treatments or experimental manipulations in preclinical experimental biomedicine almost exclusively use classical block designs. Our aim with this article is to bring the existing methodology of group sequential designs to the attention of researchers in the preclinical field and to clearly illustrate its potential utility. Group sequential designs can offer higher efficiency than traditional methods and are increasingly used in clinical trials. Using simulation of data, we demonstrate that group sequential designs have the potential to improve the efficiency of experimental studies, even when sample sizes are very small, as is currently prevalent in preclinical experimental biomedicine. When simulating data with a large effect size of d = 1 and a sample size of n = 18 per group, sequential frequentist analysis consumes in the long run only around 80% of the planned number of experimental units. In larger trials (n = 36 per group), additional stopping rules for futility lead to the saving of resources of up to 30% compared to block designs. We argue that these savings should be invested to increase sample sizes and hence power, since the currently underpowered experiments in preclinical biomedicine are a major threat to the value and predictiveness in this research domain.

Subject(s)

Biomedical Research , Research Design

19.

Bias in the reporting of sex and age in biomedical research on mouse models.

Flórez-Vargas, Oscar; Brass, Andy; Karystianis, George; Bramhall, Michael; Stevens, Robert; Cruickshank, Sheena; Nenadic, Goran.

Elife ; 52016 Mar 03.

Article in English | MEDLINE | ID: mdl-26939790

ABSTRACT

In animal-based biomedical research, both the sex and the age of the animals studied affect disease phenotypes by modifying their susceptibility, presentation and response to treatment. The accurate reporting of experimental methods and materials, including the sex and age of animals, is essential so that other researchers can build on the results of such studies. Here we use text mining to study 15,311 research papers in which mice were the focus of the study. We find that the percentage of papers reporting the sex and age of mice has increased over the past two decades: however, only about 50% of the papers published in 2014 reported these two variables. We also compared the quality of reporting in six preclinical research areas and found evidence for different levels of sex-bias in these areas: the strongest male-bias was observed in cardiovascular disease models and the strongest female-bias was found in infectious disease models. These results demonstrate the ability of text mining to contribute to the ongoing debate about the reproducibility of research, and confirm the need to continue efforts to improve the reporting of experimental methods and materials.

Subject(s)

Biomedical Research/methods , Disease Models, Animal , Selection Bias , Age Distribution , Animals , Data Mining , Mice , Sex Distribution

20.

Modelling and extraction of variability in free-text medication prescriptions from an anonymised primary care electronic medical record research database.

Karystianis, George; Sheppard, Therese; Dixon, William G; Nenadic, Goran.

BMC Med Inform Decis Mak ; 16: 18, 2016 Feb 09.

Article in English | MEDLINE | ID: mdl-26860263

ABSTRACT

BACKGROUND: Free-text medication prescriptions contain detailed instruction information that is key when preparing drug data for analysis. The objective of this study was to develop a novel model and automated text-mining method to extract detailed structured medication information from free-text prescriptions and explore their variability (e.g. optional dosages) in primary care research databases. METHODS: We introduce a prescription model that provides minimum and maximum values for dose number, frequency and interval, allowing modelling variability and flexibility within a drug prescription. We developed a text mining system that relies on rules to extract such structured information from prescription free-text dosage instructions. The system was applied to medication prescriptions from an anonymised primary care electronic record database (Clinical Practice Research Datalink, CPRD). RESULTS: We have evaluated our approach on a test set of 220 CPRD prescription free-text directions. The system achieved an overall accuracy of 91 % at the prescription level, with 97 % accuracy across the attribute levels. We then further analysed over 56,000 most common free text prescriptions from CPRD records and found that 1 in 4 has inherent variability, i.e. a choice in taking medication specified by different minimum and maximum doses, duration or frequency. CONCLUSIONS: Our approach provides an accurate, automated way of coding prescription free text information, including information about flexibility and variability within a prescription. The method allows the researcher to decide how best to prepare the prescription data for drug efficacy and safety analyses in any given setting, and test various scenarios and their impact.

Subject(s)

Biomedical Research/methods , Databases, Factual , Electronic Health Records , Electronic Prescribing , Medical Informatics Applications , Primary Health Care , Data Anonymization , Humans

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL